67 research outputs found

    Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data

    Get PDF
    Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner

    A High Performance, Spatiotemporal Statistical Analysis System Based on a Spatiotemporal Cloud Platform

    No full text
    With the increase in size and complexity of spatiotemporal data, traditional methods for performing statistical analysis are insufficient for meeting real-time requirements for mining information from Big Data, due to both data- and computing-intensive factors. To solve the Big Data challenges in geostatistics and to support decision-making, a high performance, spatiotemporal statistical analysis system (Geostatistics-Hadoop) is proposed in this paper. The proposed system has several features: (1) Hadoop is enhanced to handle spatial data in a native format and execute a number of parallelized spatial analysis algorithms to solve practical geospatial analysis problems; (2) the Oozie-based workflow system is utilized to ease the operation and sharing of spatial analysis services; and (3) a private cloud platform based on Eucalyptus is leveraged to provide on-the-fly and elastic computing resources. Experimental results show that Geostatistics-Hadoop efficiently conducts rapid information mining and analysis of big spatiotemporal data sets, with the support of elastic computing resources from a cloud platform. The adoption of cloud computing and the Hadoop cluster to parallelize statistical calculations significantly improves the performance of Big Data analyses

    Improving NoSQL Spatial-Query Processing with Server-Side In-Memory R*-Tree Indexes for Spatial Vector Data

    No full text
    Geospatial databases are basic tools to collect, index, and manage georeferenced data indicators in sustainability research for efficient, long-term analysis. NoSQL databases are increasingly applied to manage the ever-growing massive spatial vector data (SVD) with their changeable data schemas, agile scalability, and fast query response time. Spatial queries are basic operations in geospatial databases. According to Green information technology, an efficient spatial index can accelerate query processing and save power consumption for ubiquitous spatial applications. Current solutions tend to pursue it by indexing spatial objects with space-filling curves or geohash on NoSQL databases. As for the performance-wise R-tree family, they are mainly used in slow disk-based spatial access methods on NoSQL databases that incur high loading and searching costs. Therefore, performing spatial queries efficiently with the R-tree family on NoSQL databases remains a challenge. In this paper, an in-memory balanced and distributed R*-tree index named the BDRST index is proposed and implemented on HBase for efficient spatial-query processing of massive SVD. The BDRST index stores and distributes serialized R*-trees to HBase regions in association with SVD partitions in the same table. Moreover, an efficient optimized server-side parallel processing framework is presented for real-time R*-tree instantiation and query processing. Through extensive experiments on real-world land-use data sets, the performance of our method is tested, including index building, index quality, spatial queries, and applications. Our proposed method outperforms other state-of-the-art solutions, saving between 27.36% and 95.94% on average execution time for the above operations. Experimental results show the capability of the BDRST index to support spatial queries over large-scale SVD, and our method provides a solution for efficient sustainability research that involves massive georeferenced data

    Computational Fluid Dynamics Simulation of Combustion and Selective Non-Catalytic Reduction in a 750 t/d Waste Incinerator

    No full text
    In this study, a Computational Fluid Dynamics (CFD) approach using Ansys Fluent 15.0 and FLIC software was employed to simulate the combustion process of a 750 t/d grate-type waste incinerator. The objective was to assess the performance of Selective Non-Catalytic Reduction (SNCR) technology in reducing nitrogen oxide (NOx) emissions. Two-stage simulations were conducted, predicting waste combustion on the bed and volatile matter combustion in the furnace. The results effectively depicted the temperature and gas concentration distributions on the bed surface, along with the temperature, velocity, and composition distributions in the furnace. Comparison with field data validated the numerical model. The findings serve as a reference for optimizing large-scale incinerator operation and parameter design through CFD simulation

    A SqueeSAR Spatially Adaptive Filtering Algorithm Based on Hadoop Distributed Cluster Environment

    No full text
    Multi-temporal interferometric synthetic aperture radar (MT-InSAR) techniques analyze a study area using a set of SAR image data composed of time series, reaching millimeter surface subsidence accuracy. To effectively acquire the subsidence information in low-coherence areas without obvious features in non-urban areas, an MT-InSAR technique, called SqueeSAR, is proposed to improve the density of the subsidence points in the study area by fusing the distributed scatterers (DS). However, SqueeSAR filters the DS points individually during spatial adaptive filtering, which requires significant computer memory, which leads to low processing efficiency, and faces great challenges in large-area InSAR processing. We propose a spatially adaptive filtering parallelization strategy based on the Spark distributed computing engine in a Hadoop distributed cluster environment, which splits the different DS pixel point data into different computing nodes for parallel processing and effectively improves the filtering algorithm’s performance. To evaluate the effectiveness and accuracy of the proposed method, we conducted a performance evaluation and accuracy verification in and around the main city of Kunming with the original Sentinel-1A SLC data provided by ESA. Additionally, parallel calculation was performed in a YARN cluster comprising three computing nodes, which improved the performance of the filtering algorithm by a factor of 2.15, without affecting the filtering accuracy

    High-Performance Overlay Analysis of Massive Geographic Polygons That Considers Shape Complexity in a Cloud Environment

    No full text
    Overlay analysis is a common task in geographic computing that is widely used in geographic information systems, computer graphics, and computer science. With the breakthroughs in Earth observation technologies, particularly the emergence of high-resolution satellite remote-sensing technology, geographic data have demonstrated explosive growth. The overlay analysis of massive and complex geographic data has become a computationally intensive task. Distributed parallel processing in a cloud environment provides an efficient solution to this problem. The cloud computing paradigm represented by Spark has become the standard for massive data processing in the industry and academia due to its large-scale and low-latency characteristics. The cloud computing paradigm has attracted further attention for the purpose of solving the overlay analysis of massive data. These studies mainly focus on how to implement parallel overlay analysis in a cloud computing paradigm but pay less attention to the impact of spatial data graphics complexity on parallel computing efficiency, especially the data skew caused by the difference in the graphic complexity. Geographic polygons often have complex graphical structures, such as many vertices, composite structures including holes and islands. When the Spark paradigm is used to solve the overlay analysis of massive geographic polygons, its calculation efficiency is closely related to factors such as data organization and algorithm design. Considering the influence of the shape complexity of polygons on the performance of overlay analysis, we design and implement a parallel processing algorithm based on the Spark paradigm in this paper. Based on the analysis of the shape complexity of polygons, the overlay analysis speed is improved via reasonable data partition, distributed spatial index, a minimum boundary rectangular filter and other optimization processes, and the high speed and parallel efficiency are maintained

    A SqueeSAR Spatially Adaptive Filtering Algorithm Based on Hadoop Distributed Cluster Environment

    No full text
    Multi-temporal interferometric synthetic aperture radar (MT-InSAR) techniques analyze a study area using a set of SAR image data composed of time series, reaching millimeter surface subsidence accuracy. To effectively acquire the subsidence information in low-coherence areas without obvious features in non-urban areas, an MT-InSAR technique, called SqueeSAR, is proposed to improve the density of the subsidence points in the study area by fusing the distributed scatterers (DS). However, SqueeSAR filters the DS points individually during spatial adaptive filtering, which requires significant computer memory, which leads to low processing efficiency, and faces great challenges in large-area InSAR processing. We propose a spatially adaptive filtering parallelization strategy based on the Spark distributed computing engine in a Hadoop distributed cluster environment, which splits the different DS pixel point data into different computing nodes for parallel processing and effectively improves the filtering algorithm’s performance. To evaluate the effectiveness and accuracy of the proposed method, we conducted a performance evaluation and accuracy verification in and around the main city of Kunming with the original Sentinel-1A SLC data provided by ESA. Additionally, parallel calculation was performed in a YARN cluster comprising three computing nodes, which improved the performance of the filtering algorithm by a factor of 2.15, without affecting the filtering accuracy

    Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

    No full text
    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists

    Building an Elastic Parallel OGC Web Processing Service on a Cloud-Based Cluster: A Case Study of Remote Sensing Data Processing Service

    No full text
    Since the Open Geospatial Consortium (OGC) proposed the geospatial Web Processing Service (WPS), standard OGC Web Service (OWS)-based geospatial processing has become the major type of distributed geospatial application. However, improving the performance and sustainability of the distributed geospatial applications has become the dominant challenge for OWSs. This paper presents the construction of an elastic parallel OGC WPS service on a cloud-based cluster and the designs of a high-performance, cloud-based WPS service architecture, the scalability scheme of the cloud, and the algorithm of the elastic parallel geoprocessing. Experiments of the remote sensing data processing service demonstrate that our proposed method can provide a higher-performance WPS service that uses less computing resources. Our proposed method can also help institutions reduce hardware costs, raise the rate of hardware usage, and conserve energy, which is important in building green and sustainable geospatial services or applications

    A service brokering and recommendation mechanism for better selecting cloud services.

    No full text
    Cloud computing is becoming the new generation computing infrastructure, and many cloud vendors provide different types of cloud services. How to choose the best cloud services for specific applications is very challenging. Addressing this challenge requires balancing multiple factors, such as business demands, technologies, policies and preferences in addition to the computing requirements. This paper recommends a mechanism for selecting the best public cloud service at the levels of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). A systematic framework and associated workflow include cloud service filtration, solution generation, evaluation, and selection of public cloud services. Specifically, we propose the following: a hierarchical information model for integrating heterogeneous cloud information from different providers and a corresponding cloud information collecting mechanism; a cloud service classification model for categorizing and filtering cloud services and an application requirement schema for providing rules for creating application-specific configuration solutions; and a preference-aware solution evaluation mode for evaluating and recommending solutions according to the preferences of application providers. To test the proposed framework and methodologies, a cloud service advisory tool prototype was developed after which relevant experiments were conducted. The results show that the proposed system collects/updates/records the cloud information from multiple mainstream public cloud services in real-time, generates feasible cloud configuration solutions according to user specifications and acceptable cost predication, assesses solutions from multiple aspects (e.g., computing capability, potential cost and Service Level Agreement, SLA) and offers rational recommendations based on user preferences and practical cloud provisioning; and visually presents and compares solutions through an interactive web Graphical User Interface (GUI)
    corecore